A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability
نویسندگان
چکیده
Sustainability research faces many challenges as respective environmental, urban and regional contexts are experiencing rapid changes at an unprecedented spatial granularity level, which involves growing massive data and the need for spatial relationship detection at a faster pace. Spatial join is a fundamental method for making data more informative with respect to spatial relations. The dramatic growth of data volumes has led to increased focus on high-performance large-scale spatial join. In this paper, we present Spatial Join with Spark (SJS), a proposed high-performance algorithm, that uses a simple, but efficient, uniform spatial grid to partition datasets and joins the partitions with the built-in join transformation of Spark. SJS utilizes the distributed in-memory iterative computation of Spark, then introduces a calculation-evaluating model and in-memory spatial repartition technology, which optimize the initial partition by evaluating the calculation amount of local join algorithms without any disk access. We compare four in-memory spatial join algorithms in SJS for further performance improvement. Based on extensive experiments with real-world data, we conclude that SJS outperforms the Spark and MapReduce implementations of earlier spatial join approaches. This study demonstrates that it is promising to leverage high-performance computing for large-scale spatial join analysis. The availability of large-sized geo-referenced datasets along with the high-performance computing technology can raise great opportunities for sustainability research on whether and how these new trends in data and technology can be utilized to help detect the associated trends and patterns in the human-environment dynamics.
منابع مشابه
Spatial Evaluation of Energy Performance at Neighborhood Scale Case study: Sanandaj city
Climate change has become a challenge with adverse impacts on the Earth. Reducing the use of fossil fuel is a primary step to solve environmental problems. As the population continues to rise, to meet the growing demand for construction with a large share in energy Consumption, Efforts to make the built environment more energy efficient is crucial. The main objective of this research is to eval...
متن کاملAn Effective High-Performance Multiway Spatial Join Algorithm with Spark
Multiway spatial join plays an important role in GIS (Geographic Information Systems) and their applications. With the increase in spatial data volumes, the performance of multiway spatial join has encountered a computation bottleneck in the context of big data. Parallel or distributed computing platforms, such as MapReduce and Spark, are promising for resolving the intensive computing issue. P...
متن کاملHigh-Performance Spatial Join Processing on GPGPUs with Applications to Large-Scale Taxi Trip Data
Spatially joining GPS recorded locations with infrastructure data, such as points of interests, road network, land cover and different types of zones, and assigning a point with its nearest polyline or polygon is a prerequisite for trip related analysis, which is becoming increasingly important in ubiquitous computing. However, existing spatial databases and GIS are incapable of handling large-...
متن کاملSpatial analysis of distribution and access to urban services at the level of urban neighborhoods with a spatial justice approach (Case study: Commercial uses of Ardabil city)
One of the most important and urgent issues of urban planning is the equitable distribution of facilities, services and accessibility of citizens at the urban level. Economic and commercial centers, including banks and financial institutions, are one of the most important economic sectors of cities and can be sustained. Social, economic, physical, and environmental impacts of neighborhoods. The...
متن کاملCommon Spatial Patterns Feature Extraction and Support Vector Machine Classification for Motor Imagery with the SecondBrain
Recently, a large set of electroencephalography (EEG) data is being generated by several high-quality labs worldwide and is free to be used by all researchers in the world. On the other hand, many neuroscience researchers need these data to study different neural disorders for better diagnosis and evaluating the treatment. However, some format adaptation and pre-processing are necessary before ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016